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Abstract. Data assimilation (DA) holds considerable poten- 
tial for improving hydrologic predictions as demonstrated 
in numerous research studies. However, advances in hydro- 
logic DA research have not been adequately or timely imple- 
mented in operational forecast systems to improve the skill 
of forecasts for better informed real-world decision making. 
This is due in part to a lack of mechanisms to properly quan- 
tify the uncertainty in observations and forecast models in 
real-time forecasting situations and to conduct the merging 
of data and models in a way that is adequately efficient and 
transparent to operational forecasters. 


The need for effective DA of useful hydrologic data into 
the forecast process has become increasingly recognized in 
recent years. This motivated a hydrologic DA workshop in 
Delft, the Netherlands in November 2010, which focused 
on advancing DA in operational hydrologic forecasting and 
water resources management. As an outcome of the work- 
shop, this paper reviews, in relevant detail, the current sta- 
tus of DA applications in both hydrologic research and op- 
erational practices, and discusses the existing or potential 
hurdles and challenges in transitioning hydrologic DA re- 
search into cost-effective operational forecasting tools, as 
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well as the potential pathways and newly emerging oppor- 
tunities for overcoming these challenges. Several related as- 
pects are discussed, including (1) theoretical or mathemati- 
cal aspects in DA algorithms, (2) the estimation of different 
types of uncertainty, (3) new observations and their objec- 
tive use in hydrologic DA, (4) the use of DA for real-time 
control of water resources systems, and (5) the development 
of community-based, generic DA tools for hydrologic ap- 
plications. It is recommended that cost-effective transition 
of hydrologic DA from research to operations should be 
helped by developing community-based, generic modeling 
and DA tools or frameworks, and through fostering collab- 
orative efforts among hydrologic modellers, DA developers, 
and operational forecasters. 


1 Introduction 

It is essential to properly characterize and communicate un- 
certainty in weather, climate, and hydrologic forecasts to be 
able to effectively support emergency management and wa- 
ter resources decision making (National Research Council, 
2006). In hydrology, the importance of accounting for var- 
ious types of uncertainty involved in the prediction process 
has been increasingly recognized in recent years (e.g., Pap- 
penberger and Beven, 2006; Schaake et ah, 2006; Brown, 
2010). Uncertainty in hydrologic predictions can originate 
from several major sources, including errors in the model 
structure and model parameters, as well as model initial 
conditions and hydrometeorologic forcing (e.g., Ajami et 
ah, 2007; Kavetski et ah, 2006a, b; Salamon and Feyen, 
2010). Effective quantification and reduction of these un- 
certainties is necessary to enable the generation of forecast 
products with accurate and actionable guidance on predic- 
tive uncertainty to enable risk-based decision making (e.g., 
Pappenberger et ah, 2008, 2011; Thielen et ah, 2009; Coc- 
cia and Todini, 2011; Weerts et ah, 2011). The application 
of data assimilation (DA), which optimally merges informa- 
tion from model simulations and independent observations 
with appropriate uncertainty modeling, has proved promis- 
ing in improving prediction accuracy and quantifying un- 
certainty (e.g., McLaughlin, 2002; Liu and Gupta, 2007; 
Reichle, 2008). 

Over the last couple of decades, the abundance of new hy- 
drologic observations (in-situ or remotely sensed) has stim- 
ulated a great deal of research into the use of these obser- 
vations for improving hydrologic predictions via model-data 
infusion applications. Many of these applications rely on as- 
similating traditional in-situ observations such as discharge, 
soil moisture and snowpack measurements into hydrologic 
models to improve predictions of streamflow and other hy- 
drologic variables (e.g., Seo et ah, 2003, 2009; Vrugt et 
ah, 2005; Weerts and El Serafy, 2006; Clark et ah, 2008a; 
Komma et ah, 2008; Moradkhani and Sorooshian, 2008; 


Third et ah, 2010a, b). In recent years, increasing availability 
of satellite observations (e.g., van Dijk and Renzullo, 2011) 
has generated unprecedented research activity into assimi- 
lating these remotely sensed retrievals of various quantities, 
such as soil moisture (e.g., Pauwels et ah, 2001; De Lan- 
noy et ah, 2007; Moradkhani and Sorooshian, 2008; Reichle 
et ah, 2008; Yirdaw et ah, 2008; Crow and Ryu, 2009; Ku- 
mar et ah, 2009; Brocca et ah, 2010; Montzka et ah, 2011; 
Peters-Lidard et ah, 2011; Liu et ah, 2012), snow water 
equivalent and/or snow cover area or extent (e.g., Rodell and 
Houser, 2004; Lee et ah, 2005; Andreadis and Lettenmaier, 
2006; Liston and Hiemstra, 2008; Zaitchik et ah, 2008; Du- 
rand et ah, 2009; Kolberg and Gottschalk, 2010; Kuchment 
et ah, 2010; DeChant and Moradkhani, 2011a; De Lannoy 
et ah, 2012), surface water elevation (e.g., Alsdorf et ah, 
2007; Montanari et ah, 2009; Neal et ah, 2009; Giustarini 
et ah, 2011), terrestrial water storage (Zaitchik et ah, 2008) 
and land surface temperature (Reichle et ah, 2010), among 
others. These DA applications were developed for a vari- 
ety of models ranging from physically-based land-surface 
models (e.g., Albergel et ah, 2008; Nagarajan et ah, 2010) 
to distributed hydrologic models (e.g., Clark et ah, 2008a; 
Rakovec et ah, 2012a, b) and conceptual rainfall-runoff mod- 
els (e.g., Aubert et ah, 2003; Seo et ah, 2003, 2009; Morad- 
khani et ah, 2005a, b; Weerts and El Serafy, 2006), hy- 
draulic models (e.g., Shiiba et ah, 2000; Madsen et ah, 2003; 
Neal et ah, 2007; Schumann et ah, 2009; Weerts et ah, 
2010; Ricci et ah, 2011), groundwater models (e.g., Valstar et 
ah, 2004; Franssen et ah, 2011), coupled surface-subsurface 
models (e.g., Camporese et ah, 2009), biogeochemical mod- 
els (e.g., Chen et ah, 2009), and sediment transport models 
(e.g., Stroud et ah, 2009). Less well-known is the applica- 
tion of DA in the real-time control or operation of various 
types of water resources systems (e.g., Bauser et ah, 2010; 
Schwanenberg et ah, 2011). 

In the meantime, DA algorithms are becoming increas- 
ingly sophisticated, from simple mle -based, direct inser- 
tion methods to advanced smoothing and sequential tech- 
niques as well as the various variants of these techniques. 
These include, for example, the one-, two-, three- and four- 
dimensional variational algorithms (1D-, 2D-, 3D-, and 4D- 
VAR, e.g., Seo et ah, 2003, 2009; Valstar et ah, 2004), ex- 
tended or ensemble Kalman filtering (EKF or EnKF, e.g., 
Moradkhani et ah, 2005b; Slater and Clark, 2006; Weerts and 
El Serafy, 2006; Shamir et ah, 2010), particle filtering (e.g., 
Moradkhani et ah, 2005a; Weerts and El Serafy, 2006; Mat- 
gen et ah, 2010; DeChant and Moradkhani, 2012), H-infinity 
filters (Wang and Cai, 2008), hybrid EnKF or 4D-VAR ap- 
proaches (e.g., Zhang et ah, 2009), and other Bayesian ap- 
proaches (e.g., Reggiani and Weerts, 2008; Todini, 2008; 
Reggiani et ah, 2009). While most DA applications have fo- 
cused on updating hydrologic model states (e.g., soil mois- 
ture and snow water equivalent), recent research has also ex- 
amined the benefits of estimating model states and model pa- 
rameters simultaneously (e.g., Moradkhani et ah, 2005a, b; 
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Vnigt et al., 2005; Franssen and Kinzelbach, 2008; Lu et al., 
2010; Leisenring and Moradkhani, 2011; Nie et al., 2011) as 
well as the possibility of model structure identification and 
uncertainty estimation (e.g., Neuman, 2003; Bulygina and 
Gupta, 2010; Hsu et al., 2009; Parrish et al., 2012). 

It is worth noting that many of the hydrologic DA studies 
reported in the literature focused on advancing the theoretical 
development of DA techniques using, for example, identical 
or fraternal twin synthetic experiments (e.g., Andreadis et al., 
2007; Kumar et al., 2009; Crow and Ryu, 2009). This is es- 
pecially the case when it comes to assimilating satellite data. 
The synthetic experiments are useful for diagnostic and de- 
sign purposes such as assessing the impact of improper char- 
acterization of model and observation errors (e.g., Crow and 
Van Loon, 2006; Reichle et al., 2008) and evaluating the po- 
tential benefits of future satellite missions (e.g., Matgen et 
al., 2010). Nevertheless, despite the overwhelming research 
into hydrologic DA, only a few studies (e.g., Seo et al., 2003, 
2009; Thirel et al., 2010a, b; Weerts et al., 2010; DeChant 
and Moradkhani, 201 la, 2011b) formulated DA in an opera- 
tional setting and attempted to evaluate the performance gain 
from DA in a forecast mode (e.g., as a result of better char- 
acterized initial conditions). The application of advanced DA 
techniques for improving hydrologic forecasts by operational 
agencies is even rarer, especially when it comes to assimilat- 
ing new observations from multiple sources across a range 
of spatiotemporal scales. In operational practice, the correc- 
tion of model inputs, states, initial conditions and parameters 
is often conducted in a rather empirical and subjective way 
(Seo et al., 2009). Generally speaking, hydrologic DA as an 
objective tool for reducing predictive uncertainty is not yet 
technically ready for operational hydrologic forecasting and 
water resources management. This is due in part to a lack 
of mechanisms to properly quantify the uncertainty in ob- 
servations and forecast models in real-time forecasting sit- 
uations and to conduct the merging of data and models in a 
way that is adequately efficient and transparent to operational 
forecasters. 

Nevertheless, the need for implementing effective DA in 
the forecast process to bridge the immense gap between the 
theory and operational practice is increasing. For example, 
Welles et al. (2007) reported that the hydrologic forecast- 
ing skill for some river basins at the US National Weather 
Service (NWS) River Forecast Centers has hardly improved 
over the past decade, with above flood-stage hydrologic fore- 
casts beyond three days having very poor skill. This high- 
lights the potential, as well as the need, of assimilating new 
observations into the operational hydrologic forecasting pro- 
cess to improve the predictive skill and extend the forecast 
lead time. For many parts of the world, remotely-sensed ob- 
servations (e.g., satellite images) are the only observations 
available and their optimal use in hydrologic forecasting via 
DA needs to be fully explored (National Research Council, 
2007). In meteorological and atmospheric sciences, steady 
improvements in numerical weather forecasting and climate 


prediction over the last couple of decades have been enabled 
to a certain degree by the development of community-based 
models and DA systems (e.g., Pappenberger et al., 2011). In 
the meantime, while satellite DA has not been adequately ex- 
plored in operational hydrology, the improvement of perfor- 
mance in operational weather forecast has been attributed (at 
least partially) to the incorporation of satellite data whose 
quality and spatiotemporal resolutions have been steadily im- 
proving in recent years (Rabier, 2005; Reichle, 2008). The 
hydrologic community should learn from the experiences of 
the meteorological and atmospheric communities by accel- 
erating the transition of hydrologic DA research into oper- 
ations to better utilize new observations and by developing 
community-supported, open-source modeling systems (e.g., 
Werner et al., 2012; McEnery et al., 2005) and DA tools (e.g., 
van Velzen and Verlaan, 2007; Kumar et al., 2008b; Weerts 
et al., 2010). 

It is important to note that in addition to community sup- 
port and the use of new sources of data (e.g., satellite-based 
products), other factors may have also contributed to the ap- 
parently greater advances of DA in operational meteorology 
than in operational hydrology. These include, among other 
reasons, the differences in the underlying physical system 
(i.e., atmospheric vs. land/hydrologic systems), types of data 
and procedures used by the forecasting systems as well as 
other historical/societal reasons, such as more funding and 
higher relevance of good forecasts (e.g., for aviation and 
military) for operational meteorology. For example, in con- 
trast to developments in operational meteorology, develop- 
ments (in both science and technology) in operational hy- 
drologic forecasting have taken place more on a local, na- 
tional and regional (e.g., in the case of trans-boundary rivers) 
rather than multinational or international scale. Also, hydro- 
logic forecasting systems often employ workflows with nu- 
merous models that represent different processes, all linked 
together to provide the best forecast for the up- and down- 
stream locations. This has rendered it less straightforward to 
apply consistent automated DA procedures across the hydro- 
logic forecasting systems. It is also interesting to note that 
historic developments have led to a difference in the hy- 
drologic forecast paradigms in the US and Europe. In the 
US, the flood forecasting procedure used by the NWS River 
Forecast Centers has traditionally involved manual modifi- 
cations (MODS; see for instance Seo et al., 2003; Smith et 
al., 2003) of parameters and states and this is still the case. 
While in Europe, flood forecasting procedures include more 
automated adjustments to the hydrologic forecasts, likely 
due to the fact that in Europe upgrades of flood forecast- 
ing systems have taken place since the early 2000s (see for 
example Werner et al. (2009, 2012) that describe some of 
the developments in flood forecasting systems/procedures). 
Currently, hydrologic operational centers across Europe ap- 
ply automated methods like autoregressive-moving- average 
(ARMA) or error correction methods, deterministic updating 
methods (e.g., Moore, 2007), statistical (or post-processing) 
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correction techniques, and to a much lesser degree ensemble 
data assimilation methods like the EnKF. 

The assimilation of various types of observations into op- 
erational hydrologic forecasting offers ample research op- 
portunities and poses substantial challenges such as satellite 
retrieval algorithm development, bias correction, error es- 
timation, downscaling, model diagnosis and improvement, 
new DA algorithm development, efficient or effective per- 
formance evaluation, computational efficiency enhancement, 
among others. To address these challenges and identify po- 
tential opportunities in the context of improving operational 
hydrologic forecasting and water resources management via 
DA, an international workshop was held in Delft, the Nether- 
lands on 1-3 November 2010 (Weerts and Liu, 2011). The 
overall goal of the workshop was to develop and foster 
community-based efforts for collaborative research, devel- 
opment and synthesis of techniques and tools for hydrologic 
DA, and the cost-effective transition of these techniques and 
tools from research to operations. The workshop was at- 
tended by a mix of senior scientists and graduate students 
from a range of entities including universities, government 
agencies, operational centers, and nonprofit research insti- 
tutions from 12 different countries representing 23 different 
organizations. 

This paper reviews the status of DA applications in hy- 
drology from several important aspects and summarizes the 
discussion and findings from the workshop regarding the pro- 
gresses, challenges, and opportunities in advancing DA ap- 
plications in operational forecasting. It is noted that the cur- 
rent paper does not seek to perform a comprehensive assess- 
ment of the state of the hydrologic DA research; rather, it 
presents the knowledge, experience, and best judgments of 
the workshop participants in relevant areas of applying DA 
to operational hydrologic forecasting and water resources 
management, and makes corresponding recommendations 
for advancing these applications (where possible or relevant). 
Since DA applications for both hydrologic and land surface 
models are relevant for operational hydrologic forecasting 
or water resources management across various spatiotempo- 
ral scales, advances in both research areas will be discussed 
(albeit currently conceptual rainfall-runoff models are more 
commonly used in operational hydrologic forecasting than 
physically-based land surface models). 

The paper is organized as follows. Theoretical and mathe- 
matical aspects of hydrologic DA applications are reviewed 
in Sect. 2, followed by a discussion on the modeling and 
quantification of model and data uncertainties in DA appli- 
cations in Sect. 3. Section 4 discusses the challenges and 
new opportunities related to the objective utilization of new 
and existing sources of data (in-situ or remotely-sensed) for 
hydrologic DA applications. Section 5 is devoted to the dis- 
cussion of using DA for the real-time control and operation 
of water resources systems, an area of research and develop- 
ment less well-known to the general hydrologic community. 
The development and potential benefits of open-source and 


community-based tools for hydrologic DA is presented in 
Sect. 6. A summary of the discussions is presented in Sect. 7. 

2 Theoretical aspects 

Broadly speaking, operational hydrologic forecasting 
presents three types of DA problems. The first is the state 
updating problem in which data such as stage, streamflow, 
rainfall, snow water equivalent, snow depth, potential or 
actual evapotranspiration, soil moisture and piezometric 
heads are assimilated into lumped or distributed hydrologic, 
hydraulic or land surface models to update the models’ 
dynamic states. The second is the parameter estimation or 
optimization problem, referred to often as calibration, in 
which the data are used to estimate or optimize the model 
parameters that may be considered static or time-varying. 
The third, termed the error updating problem, refers to using 
DA to revise the predictions of an error model representing 
the difference between the hydrologic forecasts and corre- 
sponding observations. These three types of DA problems 
are not mutually exclusive since a forecasting system can 
utilize any combination (see e.g., Young, 2002; Moradkhani 
et ah, 2005b). The focus here is largely on the first and 
third types, while referring the readers to the vast literature 
on calibration for the second type (e.g., Beven and Binley, 
1992; Vrugt et ah, 2003). 

Below, we briefly review the theoretical basis of the ex- 
isting DA techniques with the aim of identifying limitations, 
perceived or demonstrated, for application specifically in op- 
erational hydrologic forecasting. We then describe signifi- 
cant challenges from theoretical considerations in applying 
them in operational hydrologic forecasting. 

2.1 State updating 

The aim of this form of DA is to render the model states (as 
translated through the model dynamics) consistent with the 
observations. The most direct form of assimilating data into 
a model in operational hydrologic forecasting is the manual 
correction of the internal states of the model by human fore- 
casters based on their expert interpretation of the discrepancy 
between the recent model simulations and the observed data. 
While such techniques are widely practiced in operational 
forecasting, their effectiveness is scantily reported (Seo et 
al., 2009). It is reasonable to presume that the successful ap- 
plication of such manual techniques requires an experienced 
forecaster along with an interpretable and preferably sim- 
ple lumped hydrologic model. The latter condition is consid- 
ered necessary since it is probably impractical to manually 
apply hydrologically consistent corrections to a distributed 
hydrologic model or land surface model without simplify- 
ing rules. The formulation of such simplifying mles can be 
extended to provide an automated system of deterministic 
DA. Operational examples of such rule-based DA include the 
Grid2Grid rainfall-runoff model operational in the UK flood 
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forecasting centre (Cole et al., 2009) and the probability dis- 
tributed model (PDM, Moore, 2007) operational in the Envi- 
ronment Agency National Flood Forecasting System, which 
utilizes a flow partitioning to correct the upstream storage 
volumes in a grid-based distributed hydrologic model. Other 
schemes (e.g., Rungo et al., 1989) utilize similar concepts to 
assimilate data into hydraulic models. 

A more common framework is to consider the state space 
model in Eqs. (1) and (2) (Evensen, 1994; Liu and Gupta, 
2007), 

Xk+i — Mk+i(xk,0,Uk+\) + rik+\ ( 1 ) 

Zk+i — Hk+i(xk+\,0) + Sk+i, (2) 

where Mk + 1 represents the forward model that propagates 
the system states x from time t k to tk+\ in response to the 
model input Uk+i (with an error term &+ 1 ) and parameters 
6; Hk - |-i is the observation function that relates model states 
and parameters to observations Zk+W r Jk+\ denotes the model 
error with mean rjk+l and covariance Qk+\\ and e^+i de- 
notes the observation error with mean Sk+\ and covariance 
Rk- |-i. In this notation, the parameter vector 6 is considered 
to be time invariant and can be determined from either phys- 
ical principles or parameter calibration. Unknown or time 
varying parameters can be incorporated into the state vec- 
tor to form a joint state -parameter estimation problem (e.g., 
Moradkhani et al., 2005a), with the dynamics of the model 
parameters usually prescribed by a random walk process in 
sequential algorithms. Such joint estimation may increase the 
nonlinearity between the state and observation spaces and 
exacerbate the computational challenges highlighted in this 
section. An alternative approach is to conduct dual estima- 
tion, where the unknown parameters and model states are es- 
timated separately in a parallel or interactive fashion (e.g., 
Moradkhani et al., 2005b; Vrugt et al., 2005). The predic- 
tive distribution of Zk+\ (or more generally prediction n time 
steps ahead Zk+n) arises from the mapping of the stochastic 
noise terms through Eqs. (1) and (2). The operational useful- 
ness of the forecast summarized by the predictive distribu- 
tion is dependent upon two main factors. The first of these 
is the ability of the modeller to appropriately define the dis- 
tributions of f/fc+l, Vk+l an d £&+i to characterize the various 
sources of uncertainty, including errors in the model input, 
measurement errors in the observations, conceptual discrep- 
ancies between observed and model states, and the inade- 
quacy of the model in representing the dynamics of the sys- 
tem. The second of these is the computational approxima- 
tion of the predictive distribution (see relevant discussions 
below). 

For state updating, the state space model outlined above 
may be solved as a filtering or smoothing problem. For prob- 
lems with nonlinear M and/or nonlinear H, the solution is 
not trivial. A number of computational techniques are avail- 
able to provide approximate solutions. The commonly ap- 
plied methodologies for solving the problem via filtering 


are nonlinear extensions to the Kalman filter (KF, Kalman, 
1960). Three extensions to the KF are widely known, namely, 
the extended Kalman filter (EKF, e.g., Georgakakos, 1986a, 
b; Kitanidis and Bras, 1980), ensemble Kalman filter (EnKF, 
Evensen, 1994) and unscented Kalman filter (UKF, Julier 
and Uhlmann, 1997). The key difference between the non- 
linear Kalman filters lies in the methods of propagating the 
expected value and covariance of the state space though the 
nonlinear operators M and H. The EKF linearizes (some- 
times unrealistically) these operators based on local deriva- 
tives, which are often difficult to compute reliably; hence, 
the EKF techniques has fallen out of favour (Da Ros and 
Borga, 1997). The two remaining techniques propagate the 
state space using a sample. In EnKF, the state space is pre- 
sumed to be multivariate Gaussian, while the UKF presumes 
the state space is unimodal, symmetric and unbound. Despite 
the more relaxed assumptions of the UKF, it has received lit- 
tle attention in the hydrologic literature to date. The EnKF 
technique, however, has become the most frequently used 
DA technique in the hydrologic community, due largely to 
its easy implementation and its robustness in solving most 
DA problems encountered in hydrologic applications. 

The size of the sample used in the EnKF or UKF may 
prove computationally burdensome in an operational en- 
vironment. Typically, hundreds of ensemble members are 
needed for reliable updating without filter inbreeding (e.g., 
Franssen and Kinzelbach, 2008), although for land surface 
problems often smaller ensemble sizes (e.g., less than 100) 
are used. EnKF has also been applied on large scale problems 
which involve over 10 5 unknown states and parameters. An- 
other limitation of the EnKF is that its optimal performance 
is restricted to multi-Gaussian distributed states and param- 
eters. In the operational weather forecasting community, the 
local ensemble transform Kalman filter (LETKF) was intro- 
duced to overcome the issue of prohibitively large dimen- 
sionality by solving the analysis independently in a local re- 
gion around every model grid point using only local obser- 
vations (e.g., Szunyogh et al., 2008; Ott et al., 2004). Sim- 
ilarly, Sun et al. (2009) applied grid-based localization and 
a Gaussian mixture model (GMM) clustering technique to 
improve the performance of EnKF for states and parameters 
which are not multi-Gaussian distributed. Zhou et al. (2011) 
illustrated how a normal score transformation for both states 
and parameters improved the performance of EnKF drasti- 
cally for bimodal distributed parameter fields. Here, the nor- 
mal score transformation is made for each time step and 
each grid cell, using the simulated values from the ensem- 
ble to construct space-time specific probability density func- 
tions (PDFs). The EnKF is applied on these normal-score 
transformed values, which are then back transformed based 
on the established relationships between physical values and 
normal-score transformed values. It has yet to be investigated 
in more detail under which conditions, and for which types 
of problems, such transformations could significantly outper- 
form the classical EnKF. Alternatively, more sophisticated 
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transformations could also be explored. For all these exten- 
sions, e r +i and rj t+ 1 are considered symmetric, unimodal, 
and unbounded zero-mean random variables which are in- 
dependent (both in time and of each other). This means that 
great care should be taken in constructing M and H if there 
is belief that systematic biases or phase errors in the data or 
model exist (e.g.. Dee, 2005; Crow and Van Loon, 2006). It 
is also noted that the common assumption that the distribu- 
tion of the states is unbounded is rarely satisfied in real-world 
problems; hence some mathematical transformation is often 
necessary. 

Another flexible (but potentially more computationally ex- 
pensive) approach to solving the above filtering problem 
includes the sequential Monte Carlo (SMC) methods such 
as particle filtering (PF) (e.g., Arulampalam et al., 2002; 
Moradkhani et al., 2005a; Weerts and El Serafy, 2006; Noh 
et al., 2011; Plaza et al., 2012). Similar to the EnKF, parti- 
cle filtering evolves a sample of the state space forward us- 
ing the SMC method to approximate the predictive distribu- 
tion. However, unlike the KF -based methods, PF performs 
updating on the particle weights instead of the state variables, 
which has an advantage of reducing numerical instability es- 
pecially in physically-based or process-based models. In ad- 
dition, PF is applicable to non-Gaussian state-space models. 
Earlier implementation of the particle filtering relied on se- 
quential importance sampling (SIS) which would result in 
sample degeneracy (where most of the particles converge to a 
single point). To alleviate this problem, sampling importance 
resampling (SIR) could be used (e.g., Moradkhani et al., 
2005a). Synthetic studies in other fields (e.g., Liu and Chen, 
1998; Fearnhead and Clifford, 2003; Snyder et al., 2008) 
showed that PF often needs more particles than other filtering 
methods and the required ensemble size can increase expo- 
nentially with the number of state variables. Typically, even 
for small problems with only a few unknown states and pa- 
rameters, hundreds or thousands of ensemble members may 
be needed for reliable characterization of the posterior PDF. 
Further exploration is needed to test the PF for problems 
with thousands of unknown states and parameters. Although 
PF may outperform EnKF when the number of particles is 
larger than a hundred in the case of conceptual hydrologic 
models (Weerts and El Serafy, 2006), the number of particles 
required for physically-based distributed hydrologic models 
may limit operational applications of PF. Fortunately, the re- 
cent development by Moradkhani et al. (2012) shows that, by 
combining PF with Markov Chain Monte Carlo (MCMC), 
improved performances for state -parameter estimation may 
be achieved with small, manageable ensemble sizes. Also, 
DeChant and Moradkhani (2012) showed that when prop- 
erly coded and implemented PF can be computationally even 
more efficient than the EnKF and is more effective and robust 
for joint state-parameter estimation. 

The above techniques rely on approximating the evolu- 
tion of the distribution of the unknown model states over 
time. A number of variational DA techniques (e.g., Li and 


Navon, 2001), which can be viewed as simplifications of the 
KF (since they do not propagate the state covariance ma- 
trix explicitly) have been used operationally, primarily in 
the numerical weather prediction community (see e.g., Fis- 
cher et al., 2005; Lorenc and Rawlins, 2005). In hydrol- 
ogy, Seo et al. (2003, 2009) explored the use of variational 
DA in experimental operational streamflow forecasting and 
demonstrated large potential gains over manual runtime ad- 
justments during operations. Lee et al. (2011, 2012) explored 
the use of variational methods for streamflow and/or soil 
moisture assimilation in a distributed hydrologic modeling 
framework with a high dimensional state space. The varia- 
tional techniques can be particularly appealing when the co- 
variance matrix is large (for example, corresponding to more 
than 10 5 unknown states and/or parameters) such that defin- 
ing meaningful error covariance matrices is impractical in 
operational applications. 

It is important to note that all DA algorithms rely on the 
fundamental basis of Bayesian theory for updating the model 
states or parameters. For a more detailed discussion on the 
various types of hydrologic DA problems and techniques, the 
readers are referred to Liu and Gupta (2007). 

2.2 Error updating 

The error updating problem can be thought of as assimilating 
the latest observation and its corresponding prediction to in- 
form the predictive distribution of the errors in future model 
predictions with respect to the (to-be) observed data. This 
is not to be confused with the propagation of the error co- 
variance matrices in state updating applications. Rather, error 
updating discussed here refers to the use of DA to condition 
the predictions of an error model representing the difference 
between an, often deterministic, hydrologic forecast and the 
corresponding observations (see e.g., Smith et al., 2012, this 
issue). In other words, data is not assimilated into the hy- 
drologic model with the aim of producing improved fore- 
casts but is used to inform the prediction of future discor- 
dances between the model forecast and future observations. 
Operational examples include the use of ARMA time series 
models to describe transformed residuals in flood forecast- 
ing system (e.g., Broersen and Weerts, 2005) and a stochas- 
tic multiplicative correction (e.g., Lees et al., 1994; Smith et 
al., 2012, this issue). In both these cases, predictions can be 
computed using linear filtering. A wide variety of alternative 
error models may be found in the literature (Seo et al., 2006; 
Weerts et al., 2011; among others) and the problem may be 
formulated in a Bayesian context (e.g., Krzysztofowicz and 
Maranzano, 2004). 

To effectively utilize error updating to produce reliable 
forecasts (in both the probabilistic and pragmatic sense), the 
error model must provide an appropriate description of the 
difference between the observations and model predictions. 
Systematic or temporally varying bias must be removed as 
much as possible. This may prove particularly challenging 
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if it is induced by forecasted forcing such as precipitation. 
Many error models rely on temporal correlation within the 
residuals. Such correlations may, however, be low at key 
locations such as the rising limbs of hydrographs (Todini, 
2008). As such, error modeling should consider flow depen- 
dence of the correlation structure. Also, extreme situations 
such as floods may reveal previously unknown shortcom- 
ings in the hydrologic or hydraulic models. In such situa- 
tions it may be questionable if the error model continues to 
be an appropriate description of the difference between the 
observations and model predictions. 

2.3 Challenges and opportunities 

Besides the challenges discussed above, many other theoret- 
ical difficulties are present and need to be effectively ad- 
dressed. First of all, land surface and rainfall-runoff pro- 
cesses are highly nonlinear and their mathematical models 
are often not continuously differentiable. Also, the soil mois- 
ture states of a model are rarely, if ever, observed directly 
in the real world, and hence their departures from reality 
may only be inferred from observations of stage or stream- 
flow, or satellite -based microwave observations. The river 
stage, for example, is a product of spatiotemporal integration 
of not only snow accumulation and ablation, and rainfall- 
runoff processes but also various hydraulic processes ex- 
pressed through the morphology of the channel. Given these, 
it is necessary that the DA techniques for operational hy- 
drologic forecasting be able to handle nonlinear dynamics, 
nonlinear observation functions, and a mix of amplitude and 
phase errors that operate over a wide range of spatiotempo- 
ral scales (Liu et al., 2011). While progresses have been re- 
ported in the literature since the mid-1970s, it is less than 
clear today as to under what conditions and to what degree 
the existing DA techniques may be able to handle the dif- 
ferent DA problems encountered in operational hydrologic 
forecasting today. 

Precipitation and streamflow, arguably the two most im- 
portant variables in operational streamflow forecasting, are 
skewed and heteroscedastic, and accurate statistical model- 
ing of their measurement uncertainties is still a challenge. 
Also, to benefit from accurate modeling of the measure- 
ment uncertainty, the uncertainty in the model dynamics and 
physics will have to be modeled with comparable accuracy. 
It is expected, however, that there is a practical limit to the 
complexity of such uncertainty modeling (see more detailed 
discussion on modeling of uncertainties in Sect. 3). As with 
any optimal estimation techniques, the optimality of the DA 
techniques is realized only if the observations and the mod- 
els are not biased in the mean sense. As such, bias correction 
must precede or accompany DA to realize the purported op- 
timality (e.g., De Lannoy et al., 2007; Ryu et al., 2009). As 
described above, the DA problem for state updating may in- 
volve multiple model components, such as those for rainfall- 
runoff, evapotranspiration, snow, and hydrologic or hydraulic 


routing, that may operate over a wide range of spatiotemporal 
scales, with each model component contributing a different 
degree of freedom at its own dominant scale to the overall 
problem. Directly solving such a large DA problem may be 
impractical, and it may be necessary to decompose it into 
smaller problems, but without compromising the quality of 
the solution in any significant way. 

For operational forecasting, performance for extreme 
events is of particular importance. It is in such infrequently 
observed events when the mathematical DA techniques may 
prove superior to purely statistical techniques, which require 
sizable historical data. Broadly speaking, statistical tech- 
niques may be seen as an extreme end of mathematical DA 
where statistical models are used to describe the dynam- 
ics. Optimally balancing physical-dynamical and statistical 
modeling in DA under different hydrologic conditions, e.g., 
from drought to flooding, is a complex question that requires 
much additional research. Since extreme events naturally oc- 
cur rarely, the ability to assess the usefulness of DA in im- 
proving the forecasting of extreme events may be limited by 
the length of available record. This situation is exacerbated if 
the system being forecasted has undergone or is undergoing 
significant changes such that the relationship between the ob- 
servations and the model output cannot be believed to be con- 
stant in time (or space). Caution should therefore be applied 
in making too strong an assumption about the properties of 
any DA scheme. Also, with such limitations, it is apparent 
that the optimal DA scheme (as judged against some criteria 
that the modeller can assess) derived from the historic data 
may not be optimal for forecasting into the future. 

3 Modeling of uncertainties 

Model simulations or predictions are subject to various un- 
certainties and sources of forecast errors. Uncertainties may 
stem from model initialization, due to incomplete data cov- 
erage, observation errors, or an improper DA procedure. 
Other sources of uncertainty in prediction are associated with 
model input (i.e., forcing data) and imperfection of the model 
structure itself, due to the parameterization of physical pro- 
cesses or unresolved scale issues. Even with an assumption 
of having a perfect model structure, the estimates of model 
parameters could also be uncertain given the observational 
uncertainties that affect the model calibration. Therefore, the 
“optimality” of a DA scheme depends critically on the re- 
liability of error estimates for the inputs and the model it- 
self, as well as proper consideration of interdependencies and 
interaction of the uncertain model components and/or ob- 
servations (e.g., Crow and Van Loon, 2006; Moradkhani et 
al., 2006; Hong et al., 2006). This is because the weight as- 
signed to observations in a DA scheme is computed based 
on estimates of the relative error in the model and in the ob- 
servations. The discussion now turns to critically examine 
different methods to estimate these errors. 
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3.1 Uncertainty in model inputs - the problem of 
precipitation 

Precipitation is often viewed as the most uncertain model in- 
put (e.g., Huard and Mailhot, 2006; Kavetski et al., 2006a, 
b; Bardossy and Das, 2008; Renard et al., 2010). This is 
because precipitation typically has short correlation length 
scales (in both space and time), and the reliability of basin- 
average (or gridded) precipitation estimates is constrained 
by the poor spatial representativeness of most station net- 
works (e.g., Willems, 2002; Clark and Slater, 2006; Bardossy 
and Das, 2008; Villarini and Krajewski, 2008; Volkmann 
et al., 2010). The uncertainty in precipitation is difficult to 
reduce, as alternative methods for estimating precipitation, 
such as radar, satellite, and numerical weather prediction 
models, have errors that are at least as large as those in 
many operational station networks (e.g., Hossain and Huff- 
man, 2008; Volkmann et al., 2010), and substantial errors 
in basin-average rainfall still exist in well-instrumented wa- 
tersheds. Given the impact of errors in precipitation on the 
model response (e.g., Bardossy and Das, 2008), obtaining 
more reliable estimates of precipitation uncertainty is critical 
to the success of DA applications. 

In a DA context, uncertainty in precipitation is quantified 
either by stochastically perturbing the precipitation inputs or 
through conditional simulation methods. The stochastic per- 
turbation approach is the most common (e.g., Steiner, 1996; 
Crow and Van Loon, 2006; Pauwels and De Lannoy, 2006; 
Weerts and El Serafy, 2006; Clark et al., 2008a; Komma et 
al., 2008; Turner et al., 2008; Pan and Wood, 2009). In this 
approach the size of the precipitation perturbations is typi- 
cally based only on order-of-magnitude considerations. For 
example, Reichle et al. (2002) assumed the standard devi- 
ation of precipitation errors is equal to 50 % of the precip- 
itation total at each model time step. However, uncertainty 
in precipitation estimates tends to vary both spatially and 
temporally (e.g., Tian and Peters-Lidard, 2010; Sorooshian 
et al., 2011), and therefore estimates of precipitation uncer- 
tainty from such order-of-magnitude based approaches may 
be statistically unreliable. Conditional simulation methods, 
which condition precipitation estimation on observations of 
precipitation (e.g., from a station network; see Rakovec et 
al. (2012b, this issue) for an example on this) and/or other 
information (e.g., topography), have the potential to pro- 
vide more reliable uncertainty estimates (e.g., Clark and 
Slater, 2006; Gotzinger and Bardossy, 2008). For example, 
the regression-based ensemble spatial interpolation methods 
used by Clark and Slater (2006) provide an error estimate 
(i.e., the spread of the precipitation ensemble) that is con- 
nected to the error in the regression equations. This approach 
- and others, such as geostatistically based conditional sim- 
ulation techniques (Gotzinger and Bardossy, 2008) - pro- 
vides statistically reliable precipitation ensembles by explic- 
itly linking the error in precipitation estimates to the ade- 
quacy of the station network. While conditional simulation 


methods can be data-intensive to parameterize and computa- 
tionally expensive to run (McMillan et al., 2011), their po- 
tential to provide statistically reliable uncertainty estimates 
suggests that the implementation costs may be worthwhile. 

In large-scale hydrologic modeling and DA applica- 
tions, statistically reliable quantitative precipitation estimates 
(QPEs) may need to be generated based on outputs from nu- 
merical weather prediction (NWP) models, often aided with 
other available sources of precipitation information (e.g., sta- 
tions, radars, and satellites). Statistical post processing (e.g., 
downscaling and bias correction) of NWP-based precipita- 
tion estimates is commonly practiced to close the scale gap 
between NWP outputs and hydrologic applications and to re- 
duce the systematic bias in the NWP precipitation estimates, 
while at the same time reproducing the observed local-scale 
space-time variability in precipitation and other forcing vari- 
ables (e.g., Clark et al., 2004a, b; Piani et al., 2010; Rojas et 
al., 2011). Ehret et al. (2012) however cautioned the use of 
bias correction on precipitation and other outputs from global 
and regional circulation models for hydrologic applications 
and propose that improving the simulations from these mod- 
els (e.g., via increased resolutions and ensemble predictions) 
is the most promising solution for reducing the uncertainty 
in precipitation predictions from these models. 

3.2 Uncertainty in the model itself 

Here we define a model as a simplified representation of re- 
ality, in which the structure of the model includes the se- 
lection of model equations and the time stepping scheme 
used to integrate the model equations forward in time (e.g., 
Clark and Kavetski, 2010). Model structure uncertainty is as- 
sociated with the assumptions that act on the development 
of the model conceptualization and mathematical structure. 
An unfortunate truth in model development is that no mat- 
ter how many resources are invested in developing a partic- 
ular model, there remain conditions and situations in which 
the model is unsuitable to give accurate forecast. Many of 
the model equations contain adjustable parameters that pro- 
vide scope to apply the model in different regions, and/or 
to improve the model predictions - for example, hydraulic 
conductivity can be adjusted to represent different soil types 
and/or to compensate for the reality that the Richards model 
equation is often applied at a spatial scale that is much larger 
than the scale at which the model equation was derived. In 
this context model error (or model adequacy) is the fidelity 
of the model response to external forcing, and includes both 
errors in the model structure (the equations and time stepping 
scheme) and errors in the model parameter values. Quantify- 
ing model error is an extremely difficult proposition, because 
the many different sources of uncertainty in a model inter- 
act in complex ways. The community has adopted four main 
approaches to quantify model error (as discussed below). 

The first approach, and similar to the stochastic perturba- 
tion approach for precipitation, is to stochastically perturb 
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the model state variables (e.g., Reichle et al., 2002; Vmgt 
et al., 2006; Clark et al., 2008a). Again, these perturba- 
tions are based on order-of-magnitude considerations, and 
may therefore be statistically unreliable. This approach ad- 
dresses the model uncertainty term r/ in Eq. (1) by adding 
random perturbations to the model physics, while the other 
three approaches to be discussed below aim to repre- 
sent the model uncertainty through quantifying the uncer- 
tainties in model parameters and/or the model structure 
with more sophisticated approaches (in addition to adding 
random perturbations). 

The second approach is to use inverse methods to in- 
fer probability distributions for each model parameter (e.g., 
Beven and Binley, 1992; Vmgt et al., 2003). However, in this 
approach it is typically assumed that the initial condition, 
model structure and the model inputs are perfect, which can 
lead to model parameters being tweaked to unrealistic values 
to compensate for errors in model structure and model inputs 
(Thyer et al., 2009). Moreover, the inverse problem is often 
poorly constrained, which can result in parameters in one part 
of the model assigned unrealistic values to compensate for 
unrealistic parameters in another part of the model (Beven, 
2006). Such unrealistic inference of probability distributions 
of model parameters can lead to situations where the right 
answers are obtained (e.g., reasonable total uncertainty es- 
timates) for all of the wrong reasons. In the groundwater 
hydrologic community, attempts have been made to address 
these parameter identifiability issues via Monte Carlo type 
inverse modeling techniques, which apply 4D-VAR tech- 
niques on a large number of stochastic realizations of one 
or more spatially distributed parameter fields (Sahuquillo et 
al., 1992; Franssen et al., 2003). 

The third approach is to modify the states and parameters 
simultaneously and quantify the uncertainty associated with 
them all within a sequential (or recursive) DA framework 
(e.g., Moradkhani et al., 2005b; Vrugt et al., 2005; Naev- 
dal et al., 2003). In this approach, the real time updating 
of state variables and parameter values allow the model to 
more closely reproduce the observed system response given 
the updating procedure implemented (i.e., linear updating in 
ensemble Kalman filtering vs. sequential Bayesian updating 
and resampling in particle filtering) at each observation time 
(Moradkhani and Sorooshian, 2008). Various applications of 
such methods in streamflow forecasting, soil moisture, snow 
water equivalent estimation, groundwater flow modeling and 
flood inundation mapping have been reported (e.g., Franssen 
et al., 2008; Matgen et al., 2010; Leisenring and Moradkhani, 
2011; Montzka et al., 2011). 

The fourth and final approach to quantify model error is to 
use multi-model ensembles (e.g., Georgakakos et al., 2004). 
However, obtaining reliable uncertainty estimates with multi- 
model ensembles relies entirely on chance. The selection of 
individual models in the multi-model ensemble is habitually 
rather ad hoc, with insufficient attention given to whether the 
differences among the individual models represent the uncer- 


tainty in simulating natural processes. Many models share a 
similar heritage, and it is common for different models to get 
the wrong answers for the same reasons. 

3.3 Challenges and opportunities 

As discussed above, our capabilities for quantifying model 
error show still important deficits. This section outlines the 
major challenges and suggests some potential ways in which 
we as a community can improve uncertainty estimation. 

3.3.1 Disentangling different sources of uncertainty 

A fundamental challenge for quantifying errors in model 
inputs and in the model itself is that the different error 
sources are extremely difficult to disentangle (e.g., Kuczera 
et al., 2006). This includes the uncertainty with respect to 
the values for a large number of different parameters, pos- 
sibly showing a mutual strong correlation, as is typically 
the case for land surface models. Indeed, many attempts 
to estimate model error effectively lump together different 
sources of error. For example, the probabilistic parameter in- 
ference methods, such as the generalized likelihood uncer- 
tainty estimation (GLUE) methodology, the Shuffled Com- 
plex Evolution Metropolis (SCEM) algorithm and many of 
the inverse modeling methods in vadose zone and ground- 
water hydrology, effectively map all sources of model uncer- 
tainty onto the model parameters (Beven and Binley, 1992; 
Vmgt et al., 2003). 

There are a few promising approaches to disentangle the 
different sources of uncertainty. The first is the simultane- 
ous optimization and data assimilation (SODA) algorithm, 
in which sequential DA methods are used as part of the 
probabilistic parameter inference (Vmgt et al., 2005). In this 
case, stochastic state perturbations and state updates are used 
to account for model error, reducing the extent to which 
model error contaminates the inference of the model param- 
eters (e.g., Clark and Vmgt, 2006). The second approach is 
the Bayesian Total Error Analysis (BATEA) methodology, 
which specifies error models for all sources of uncertainty 
and uses available data to refine the error models (Kavet- 
ski et al., 2006a, b; Kuczera et al., 2006). The effectiveness 
of BATEA critically depends on the availability of informa- 
tive prior information for each individual source of uncer- 
tainty (Renard et al., 2010). Given the potential for differ- 
ent sources of uncertainty to compensate for each other (e.g., 
Crow and Van Loon, 2006), the inference problem may be 
ill-posed (e.g., Kuczera et al., 2006). The third approach is 
the online dual state and parameter estimation within a DA 
framework (e.g., Moradkhani et al., 2005b). As demonstrated 
in various studies (Franssen and Kinzelbach, 2008; DeChant 
and Moradkhani, 2012), these methods rely on sequential 
Bayesian estimation that seems better able to benefit from 
the temporal organization and structure of information con- 
tent in the data, achieving better confonnity of the model 
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output with observations. Moreover, such an approach con- 
siders the interdependencies among state variables and pa- 
rameters concurrently given the highly interactive nature of 
these model components. Recent development by Morad- 
khani et al. (2012) and Leisenring and Moradkhani (2012) 
show how DA using PF combined with MCMC and variable 
variance multiplier (WM) can considerably enhance the ac- 
curacy of online state -parameter estimation with more reli- 
able quantification of uncertainty. We anticipate that further 
developments in all of these areas will improve capabilities 
to produce more meaningful uncertainty estimates. 

3.3.2 Constraining the inference problem 

Another challenge - as mentioned above - is that in- 
verse methods for parameter inference are often poorly con- 
strained, resulting in unrealistic parameter values. The ob- 
jective functions typically used for parameter inference are 
based on aggregate measures of model performance (e.g., the 
sum of squared differences between simulated and observed 
streamflow), and the individual components of the model 
are rarely subject to scientific scrutiny (Kuczera and Franks, 
2002; Gupta et al., 2008; McMillan et al., 2011). The infer- 
ence of the different sources of uncertainty can therefore be 
improved through more intelligent use of the available data 
(e.g., Gupta et al., 1998, 2008), for example, by separately 
examining amplitude and phase errors (e.g., Liu et al., 2011). 

3.3.3 Generating efficient multivariate ensembles 

Although not commonly practiced, parameter uncertainty 
may be accounted for in DA by generating an initial pa- 
rameter ensemble. Flowever, given the complex interrelation- 
ships among model parameters, it can be a challenge to ef- 
ficiently sampling from multivariate distributions (e.g., of 
multiple uncertain soil and vegetation parameters) in hydro- 
logic applications to generate parameter ensembles that are 
physically and dynamically consistent. Similar challenges 
may exist for generating multivariate forcing ensembles (e.g., 
for precipitation and temperature). Therefore, further re- 
search into developing appropriate multivariate statistical 
methods for hydrologic parameters and forcing variables 
should improve our ability to address relevant uncertainties 
in DA applications. 

3.3.4 Constructing reliable multi-model ensembles 

The multi-model ensemble strategy is a means to address 
model structure uncertainty by synthesizing outcomes from 
multiple models representing different parameterizations of 
underlying physical processes and has been demonstrated to 
offer better predictability (Flagedorn et al., 2005). The suc- 
cess of a multi-model strategy requires constructing a reliable 
model ensemble such that the differences among the indi- 
vidual models represent the uncertainty in simulating natural 
processes. This can be accomplished by using multi-physics 


model toolboxes such as the Framework for Understanding 
Structural Errors (FUSE) approach, which provides an effec- 
tive means to construct multiple unique models by combin- 
ing the different options for the model architecture and the 
flux equations (Clark et al., 2008b). For example, to develop 
an empirically-based surface water model, multiple options 
are available for the choice of state variables in the unsatu- 
rated and saturated zones, as well as the choice of flux equa- 
tions describing surface runoff, interflow, vertical drainage 
from the unsaturated zone, base flow, and evaporation (Clark 
et al., 2008b). Niu et al. (2011) recently reported a similar 
multi-parameterization approach within the Noah land sur- 
face model framework (Noah-MP). 

It is important to note that, although the multi-model en- 
semble approach is widely known to increase predictabil- 
ity, a model ensemble (e.g., developed with the options pro- 
vided by the multi-model or multi-parameterization frame- 
works discussed above) may not represent a complete sam- 
pling of the model space. One typical challenge involved 
in such an approach is then concerned with understanding 
the dependence or independence among the models, as well 
as the relationship between the model spread and the total 
predictive uncertainty. Based on the notion of conditional 
bias, Abramowitz and Gupta (2008) introduced an innova- 
tive “model space” metric that allows measuring the dis- 
tance between models in a theoretical model space, thus 
helping to quantify how much independent information each 
model is contributing to representing the predictive uncer- 
tainty. Another typical challenge in a multi-model ensemble 
approach is concerned with developing an effective strategy 
to optimally combine the individual models to achieve en- 
hanced predictive skill and uncertainty estimation. This can 
be addressed by simple approaches such as equal weighting 
(Palmer et al., 2000) or optimal weighting (Regonda et al., 
2006). Statistically-based approaches such as linear regres- 
sion (Krishnamurti et al., 1999) and canonical variate anal- 
ysis (Mason and Mimmack, 2002) can also be employed to 
improve the predictive skill of multi-model ensembles. Re- 
cent investigations into combining the strengths of DA and 
multi-model ensembles provide another promising opportu- 
nity for simultaneously addressing model and data uncertain- 
ties (e.g., Parrish et al., 2012). Further developments along 
these lines should help to enhance the ability to quantify and 
reduce uncertainties in hydrologic DA applications. 

4 New measurements 

Hydrologic forecasting can potentially benefit from assimi- 
lating relevant observations, especially those not used in de- 
veloping the models. This section discusses the application 
of several types of newly emerging or still underutilized hy- 
drologic observations and the challenges and opportunities 
therein. 
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4.1 Remote sensing data 

Recent advances in remote sensing technologies have en- 
abled a large suite of terrestrial observations from satellites 
and various remote sensing platforms. These observations 
can be used to derive infonnation on rainfall, evapotransp na- 
tion, snow, soil moisture, topography, vegetation dynamics, 
flooding, and the total water storage, all of which play an im- 
portant role in the hydrologic cycle. While in-situ measure- 
ments contain little information on the spatial variability of 
these important quantities, remote sensing data provide full 
spatial information on these variables (albeit sometimes at 
a very coarse resolution) and hence hold great potential for 
distributed hydrologic modeling and DA. 

4.1.1 Hydrologic observations from remote sensing 

Remotely-sensed snow observations are among the most in- 
vestigated measurements in the hydrologic research commu- 
nity. The Rutgers University Global Snow Lab (RUCL) has 
been generating snow cover measurements at varying tem- 
poral and spatial resolutions, from 1966 to present using 
the snow cover data set of the National Oceanic and At- 
mospheric Administration (NOAA). Since early 2000, the 
Moderate Resolution Imaging Spectroradiometer (MODIS) 
instrument of the National Aeronautics and Space Admin- 
istration (NASA) has also been providing daily snow maps 
at a variety of temporal and spatial resolutions (Hall et 
al., 2002). Passive microwave radiometry-based estimates of 
snow water equivalent and snow depth from several satellites 
have been generated in the past 30 yr. The Advanced Mi- 
crowave Scanning Radiometer (AMSR-E) sensor, launched 
in 2002 on board the Aqua satellite, is the most recent ad- 
dition to the passive microwave suite of instruments (Kelly, 
2009). A blended snow product, known as the Air Force 
Weather Agency (AFWA)/NASA snow algorithm (ANSA), 
has also been developed by combining the retrievals from 
both MODIS and AMSR-E retrievals (Foster et al., 2011). 
DA of remote sensing snow products has been explored in 
numerous studies (e.g., Rodell and Houser, 2004; Parajka and 
Bloschl, 2008; Hall et al., 2010; Kuchment et al., 2010). 

Remote sensing products of soil moisture have also be- 
come available in recent years. For example, surface soil 
moisture has been retrieved from a number of passive sensors 
starting 1978. These include the Scanning Multichannel Mi- 
crowave Radiometer (SMMR), the special sensor microwave 
imager (SSM/I), the Tropical Rainfall Measuring Mission 
(TRMM) microwave imager (TMI), the AMSR-E, the Wind- 
sat radiometer (Windsat), the European Space Agency (ESA) 
Soil Moisture and Ocean Salinity mission (SMOS, Kerr and 
Levine, 2008), and the Global Change Observation Mission 
- Water (GCOM-W). In the meantime, soil moisture prod- 
ucts have also been available from active sensors starting in 
1992, including the Advanced Scatterometer (ASCAT) and 
the two European Remote Sensing (ERS) satellites (ERS-1 


and ERS-2). New sensors, such as the ESA Sentinel-1 mis- 
sion and the NASA Soil Moisture Active Passive mission 
(SMAP, Entekhabi et al., 2010), will be launched in the next 
couple of years. Active and passive microwave surface soil 
moisture retrievals have been generated (e.g., Jeu, 2003; Owe 
et al., 2008; Li et al., 2010; Liu et al., 2012) and examined 
in numerous DA studies, although generally in the context 
of land surface models used in weather forecasting (e.g., Re- 
ichle et al., 2004; Balsamo et al., 2007). Examples of more 
conventional hydrologic applications also exist and demon- 
strated skill in improving streamflow estimates (e.g., Pauwels 
et al., 2001; Parajka et al., 2006; Brocca et al., 2010, 2012). 
The NASA/German Gravity Recovery and Climate Experi- 
ment (GRACE) satellite (launched in 2002) can map Earth’s 
gravity field with enough accuracy to discern month to month 
changes in the distribution of the total terrestrial water stor- 
age on Earth (Tapley et al., 2004). Despite its coarse spa- 
tial (< 150 000 km 2 at mid-latitudes) and temporal 10 days 
or more) resolutions, GRACE has been used to effectively 
measure changes in groundwater, deep soil moisture as well 
as snowpack in some DA studies (e.g., Zaitchik et al., 2008; 
Su et al., 2010; Forman et al., 2012). 

Various types of hydraulic information, such as the extent, 
elevation, slope, mass and velocity of surface water bodies, 
river discharge, as well as river bathymetry, can also be ob- 
served from space (e.g., Alsdorf et al., 2007). For example, 
surface water extent can be measured with visible sensors 
such as MODIS and Landsat and by synthetic aperture radar 
(SAR) imagery; surface water elevations have been measured 
with radar altimetry on-board the TOPEX/Poseidon (T/P) 
satellite (Birkett, 1995), the ERS-1 and ERS-2 missions, and 
more recently the Envisat (Frappart et al., 2006) and Jason- 
1 missions, and the Ice, Cloud, and land Elevation satellite 
(ICEsat; Schutz et al., 2005), as well as the Shuttle Radar 
Topography Mission (SRTM) (Farr et al., 2007). The inter- 
national Surface Water Ocean Topography mission (SWOT) 
is planned to be launched in 2019 to produce high-resolution 
observations of water elevations of the Earth surface (Alsdorf 
et al., 2007). Hydraulic information such as water elevation 
and its spatial and temporal variability is critical for short- 
term hydrologic forecasting, especially during flooding situ- 
ations. The potential of assimilating space -born water level 
information for improved discharge and water depth estima- 
tion has been explored in a few studies (e.g., Andreadis et al., 
2007; Durand et al., 2008; Neal et al., 2009; Matgen et al., 
2010; Biancamaria et al., 2011). 

In addition, satellite and airborne remote sensing data have 
also been used to develop model inputs, such as precipitation, 
land classification maps, digital elevation models, land sur- 
face property maps and spatial model parameterizations, and 
to evaluate the outputs of hydrologic models - some of which 
are used in operational systems (see van Dijk and Renzullo, 
2011 for a review). In summary, there can be little doubt that 
remote sensing provides information relevant to hydrologic 
forecasting. 
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4.1.2 Challenges and opportunities 

Operationally, satellite observations have been routinely as- 
similated into numerical weather prediction models since the 
early 1970s (Tracton and McPherson, 1977). The first oper- 
ational remote sensing application in hydrology occurred in 
the 1980s (Ramamoorthi, 1983). However, despite the recent 
increasing availability of remote sensing data, their applica- 
tion in operational hydrologic forecasting is still very limited. 

Many experimental studies exploring the use of satellite 
data have been reported in recent literature. However, most 
such studies either merely referred to the potential or util- 
ity of satellite DA, or were focused on the development of 
approaches to assimilate hydrologic observations into land 
surface models (LSM) used in numerical weather forecast- 
ing. A major difference between LSMs and “conventional” 
hydrologic models is that the former include a full descrip- 
tion of the radiation and coupled surface water and energy 
balance at diurnal time scales and - when coupled to an at- 
mospheric model - are able to consider the effect of atmo- 
spheric transmissivity on sensor observations. These features 
make it easier to assimilate satellite land surface tempera- 
ture and microwave brightness temperature observations. A 
conventional hydrologic model normally requires consider- 
able modifications or extensions to be amenable for assimi- 
lating satellite-based observations. This can include, for ex- 
ample, the coupling of a radiative transfer and energy bal- 
ance model to assimilate remotely-sensed thermal and mi- 
crowave emissions. One of the first studies attempting assim- 
ilation of satellite observations into such a “conventional” 
hydrologic model was Ottle and Vidal-Madjar (1994), who 
used land surface temperature (LST) and Normalized Dif- 
ference Vegetation Index (NDVI) products derived from Ad- 
vanced Very High Resolution Radiometer (AVHRR) obser- 
vations to update a rainfall-runoff model. Houser etal. (1998) 
was one of the first to use brightness temperature to im- 
prove soil moisture estimation in a distributed hydrologic 
model. More straightforward for hydrologic models is the 
assimilation of satellite derived products. Published exam- 
ples include the assimilation of satellite-derived soil mois- 
ture, evapotranspiration, vegetation properties and GRACE- 
derived terrestrial water storage (see van Dijk and Renzullo, 
2011 for references). 

A big challenge in assimilating remotely-sensed hydro- 
logic data is concerned with the “mapping” between ob- 
served and modeled variables. The spatial and temporal char- 
acteristics of these variables are rarely identical, and there- 
fore aggregation or disaggregation of either (or both) is re- 
quired. A specific problem arises when the remote sensing 
resolution is much coarser than that of the model, a par- 
ticular issue for the assimilation of passive microwave and 
GRACE observations. As another example, satellite obser- 
vations of surface radiances may not help estimate hydro- 
logic processes that occur within small areas below satellite 
resolution (such as runoff from saturated zones). While ap- 


proaches can and have been developed to deal with such is- 
sues (e.g., Zaitchik et al., 2008) they tend to be observation 
specific and hence not generically available; also, they do not 
overcome the fundamental lack of information on spatial pat- 
terns at scales finer than the observation footprint. This can 
affect the efficiency of DA when assimilating coarse remote 
sensing data into relatively high resolution models, present- 
ing both challenges and opportunities for realizing the frill 
potential of DA in such applications. Conceptual “mapping” 
can also be a problem. For example, most remote sensing soil 
moisture products reflect the water status of a very shallow 
top layer of the soil, whereas hydrologic models typically 
simulate the water storage of a deeper soil column. Appropri- 
ate DA approaches for assimilating derived satellite products 
(e.g., Li et al., 2012) can mitigate but not entirely avoid this 
limitation. Assimilation of satellite radiance (i.e., brightness 
temperature) observations may also help to mitigate the con- 
ceptual mapping issue and has shown to result in improved 
atmospheric and hydrologic predictions (e.g., Masahiro et al., 
2008; DeChant and Moradkhani, 201 la). 

Another challenge is the specification of uncertainty in re- 
mote sensing observations, which is a prerequisite for formal 
DA. Because of a general lack of accurate information on the 
magnitude and structure of these errors, usually simplistic 
assumptions are made. This is a pragmatic solution but can 
lead to large errors. Such errors are more likely if retrieved 
variables (e.g. soil moisture) are used rather than primary ob- 
servations (e.g., brightness temperature). This pleads for the 
assimilation of primary data, which however shifts the po- 
tential for inappropriate error specification to the biophysical 
and observation models. A promising intermediary approach 
might be to produce spatially and temporally explicit error 
estimates, either as part of the remote sensing product re- 
trieval process (Pathe et al., 2009) or through statistical com- 
parison to alternative estimates where errors are independent 
(Dorigo et al., 2010; Tian and Peters-Lidard, 2010; Liu et al., 
2012). The limited life time and changing sensor characteris- 
tics of subsequent satellite missions (e.g., those for measur- 
ing surface soil moisture discussed in Sect. 4.1.1) also poses 
a challenge. Uncertainty in future data availability and char- 
acteristics affects the operational prospects of DA (although 
ASCAT is now assimilated in some weather forecasting mod- 
els; Dharssi et al., 2011). Fortunately, these constraints have 
been relaxed by the development of simple but robust meth- 
ods to merge successive active and passive retrievals into a 
single harmonized product in a way that is easily extended to 
future missions (e.g., Liu et al., 2012). 

Finally, the engineering requirements and infrastructure 
required for operational satellite DA are currently probably 
prohibitive for many applications. For example, the obser- 
vation matrix might become very large for remote sensing 
data, making inversion of the observation matrix a challenge. 
Also, parameter optimization and state updating can intro- 
duce considerable computational overheads, even more so 
when large satellite data volumes are involved and iterative 
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solution is required. In these cases, the LETKF method (Ott 
et al., 2004) used in the meteorological community may 
represent a viable candidate for dimension reduction to fa- 
cilitate implementation in operational hydrologic forecast- 
ing. The current generation of computing infrastructure in 
most operational weather forecast centers can support in- 
tensive calculations required by satellite DA. Most opera- 
tional hydrologic forecast centers, however, lack the required 
computing support to implement such intensive calculations. 
Moreover, there is currently no computer software to support 
spatiotemporal grid-based DA to the extent that it only re- 
quires “minor” software engineering investment to achieve 
the coupling of models, data, DA techniques and exploita- 
tion of high performance computing solutions in the opera- 
tional forecasting process. Several potential components of a 
future solution have been or are being developed, through 
such initiatives as the Land Information System (Kumar 
et al., 2008a, b), OpenMI (www.openmi.org) and OpenDA 
(www.openda.org). More discussion on these community- 
oriented softwares is included in Sect. 6. 

In summary, some of the main challenges to successful as- 
similation of remote sensing data in hydrologic forecasting 
are related to the model extensions required, the mapping of 
observations to model variables, the specification of model 
and observation errors, and - for operational implementa- 
tion - near-real time access to remote sensing data services 
and the design and configuration of operational DA systems. 
Considering these challenges, it is perhaps not surprising 
that there are currently few hydrologic operational systems 
that have cleared all these hurdles successfully. Nonetheless, 
with the increasingly rapid progress being made to address 
these challenges, there is all reason to be optimistic about the 
future of satellite DA in operational hydrologic forecasting. 

4.2 Other new or underutilized observations 

Besides the remote sensing products discussed above, there 
are other new or underutilized observations worth of explo- 
ration for hydrologic DA applications. Briefly discussed be- 
low are two types of such observations: comic ray data of soil 
moisture and eddy covariance measurements of the turbulent 
fluxes between the land surface and the atmosphere. 

Neutron activity measured by cosmic ray devices provides 
an interesting new data source for soil moisture contents for 
the intermediate scale of several hectares (Zreda et al., 2008). 
Cosmic ray data may hold considerable potential for hydro- 
logic forecasting applications by providing temporally con- 
tinuous in-situ information on soil moisture content from a 
larger part of the root zone (up to one meter under dry con- 
ditions) over a larger area than time-domain reflectometery 
(TDR) or other existing probes. However, to properly assim- 
ilate these data, various issues remain to be solved, includ- 
ing properly estimating the measurement errors in the soil 
moisture content data under various conditions. For example, 
since the neutron intensity is less dependent on soil moisture 


contents under wet conditions, the cosmic ray measurements 
under wet conditions are associated with a larger uncertainty. 

Eddy covariance data measure the turbulent exchange 
fluxes (of water, energy, and carbon dioxide) between the 
land surface and the atmosphere and an extensive global net- 
work of eddy covariance “flux towers” now exists (Baldocchi 
et al., 2001). The assimilation of such data opens an opportu- 
nity for improving predictions with land surface models and 
integrated hydrologic models. An example for parameter es- 
timation can be found in Mo et al. (2008). However, turbu- 
lent flux measurements cannot be directly used for assimi- 
lation because in general there is an energy balance gap in 
the data that needs to be corrected before assimilation. Ad- 
ditional complications may arise from non-Gaussian random 
errors and their dependence on the flux magnitude or even 
the season, as well as issues related to heterogeneity within 
the footprint of eddy covariance measurements. 

The above observation types overcome some of the spa- 
tial scaling issues associated with in-situ hydrological mea- 
surements and are helpful in model parameter estimation. 
However, in the context of DA, these are still essentially lo- 
cal measurements that do not cover the full spatial domain 
of typical distributed hydrologic model applications. Appro- 
priate DA techniques are needed to infer and use a spatial 
covariance information from spatially distributed models or 
ancillary spatial observations (e.g., airborne or satellite data). 


5 DA and real-time control 
5.1 Background 

Although not well known in the hydrologic research commu- 
nity, DA techniques are also one of the important building 
blocks of Real-time Control (RTC) applications. Examples 
of such applications include the definition of minimum re- 
leases for reservoirs depending on the reservoir’s water level 
and environmental objectives, and the operation of flood de- 
tention basins based on water levels at reference locations 
(e.g., Castelletti et al., 2008). In RTC applications, a dynamic 
system of a hydraulic or water resources structure can be 
defined as a set of state variables driven by a set of inputs, 
often divided into controlled inputs (or controls) and non- 
controlled inputs (or disturbances). The controlled and non- 
controlled inputs are analogous to model parameters and in- 
puts for a hydrologic system, respectively. The overall objec- 
tive of the optimal control of such a system is to determine 
the controls that will cause the system states to satisfy a set 
of physical constraints given the deterministic or stochastic 
disturbances, while at the same time minimizing (or maxi- 
mizing) some performance criterion (Kirk, 2004). 

Traditionally, the common technique for supervisory con- 
trol of water resources systems is the definition of offline, re- 
active operating rules that optimizes the controls to minimize 
cost of operations. An alternative to the offline technique is 


www.hydrol-earth-syst-sci.net/16/3863/2012/ 


Hydrol. Earth Syst. Sci., 16, 3863-3887, 2012 


3876 


Y. Liu et al.: Advancing data assimilation in operational hydrologic forecasting 


the application of online optimization. The main representa- 
tive of this approach is Model Predictive Control (MPC), also 
referred to as Receding Horizon Control (RHC). MPC makes 
use of a process model of the dynamic system for predicting 
future trajectories of the state vector over a finite time hori- 
zon in order to determine the optimal set of controlled vari- 
ables by minimizing a cost function. Like hydrologic fore- 
casting, MPC can benefit from proper DA techniques (e.g., 
Kalman filtering) for improving estimates of the current sys- 
tem states that are the basis for enhanced accuracy in fore- 
casting future system states. A common DA technique jointly 
used with MPC is the Moving Horizon Estimation (MHE) 
approach that looks back into the past for updating current 
system states by modifying historical inputs, states or model 
parameters. The optimization problem of nonlinear MPC and 
MHE schemes can be solved by nonlinear programming al- 
gorithms such as Sequential Quadratic Programming (SQP) 
(e.g., Wachter and Biegler, 2006). 

A DA-based MPC framework typically adopts either a “si- 
multaneous” or “sequential” approach. In a simultaneous ap- 
proach, optimization and simulation (with state updating) are 
performed simultaneously. In a sequential approach, each it- 
eration of the optimization consists of a sequential simulation 
of the system with an appropriate numerical integration and 
optimization (or state updating). In this case, the optimiza- 
tion problem has a reduced variable space compared to the 
simultaneous approach and leads to valid state trajectories in 
each iteration step of the optimization. Using one or the other 
approach has certain advantages which are discussed in Diehl 
et al. (2009). 

5.2 DA applications in MPC 

Although the application of MPC to water resources sys- 
tems has been subject of research for at least 15 yr, stake- 
holders have been conservative in applying the technique for 
full automation of their systems. Ackermann et al. (2000) 
presented one of the early examples on the control of a 
run-of-river hydropower plant. In this approach, a linearised 
one-dimensional Saint- Venant model in combination with 
quadratic objective functions is used for balancing the damp- 
ing of discharge peaks against the deviation of a water level 
in the head barrage of the German river Moselle. The ap- 
proach is reported to be running successfully for more than 
10 yr. Other applications of MPC to run-of-river hydropower 
plant were investigated by Glanzmann et al. (2005), and 
more recently by Setz et al. (2008) and Sahin (2009). 
Most applications represent the river section by simple pool 
models, where the need for sophisticated DA methods is 
relatively low. 

Low-land water systems with highly interconnected and 
highly regulated river and canal networks have been the fo- 
cus of several studies in recent years and DA-based MPC 
seems to be a promising candidate for coordinated control 
of these systems. Van Overloop et al. (2008) present the ap- 


plication of MPC to the drainage of Dutch polder systems 
using MHE for state updating. Several DA techniques were 
applied to the control of a river weir in the Dutch delta of the 
rivers Rhine and Meuse in Schwanenberg et al. (2011); due 
to the small extension of the modeled river system, simple 
autoregressive (AR) error correction models in combination 
with MHE were found to outperform exclusive state updating 
techniques. Nelly et al. (2011) compared the use of sequen- 
tial Kalman filter and sequential particle filter state observer 
in the context of MPC of irrigation canals. Both approaches 
appear efficient and robust, in which the Kalman filter is very 
fast in terms of calculation time and convergence and the par- 
ticle filter has advantages in handling nonlinear features of 
the model. Breckpot et al. (2010) and Blanco et al. (2010) 
applied MPC in combination with MHE to a regional water 
system in Belgium. Bauser et al. (2010) applied EnKF to up- 
date the model states in the real-time control of a groundwa- 
ter well field, using a groundwater flow-mass transport model 
and a cost function that was based on fuzzy decision rides 
and minimized with genetic algorithms. Furthermore, MPC 
has been used in the short-term decision-support and super- 
visory control of reservoir systems, as well as the automa- 
tion of irrigation systems (e.g., van Overloop et al., 2008; 
Negenbom et al., 2009; Kearney et al., 2011). 

5.3 Challenges and opportunities 

Most MPC implementations above follow the simultaneous 
approach which reflects the general tradition in control engi- 
neering. From the hydrologist’s point of view, however, the 
sequential approach may be more attractive because it decou- 
ples simulation and optimization. This enables the usage of 
the MPC model and its integration scheme also in a simu- 
lation mode. No comprehensive analysis is available on the 
performance of both approaches applied to water resources 
systems. Therefore, a more elaborated and systematic analy- 
sis of different control options in application to water systems 
should be undertaken. 

As in variational DA approaches, most sequential MPC 
schemes use reverse adjoint modeling for computing the gra- 
dient of an arbitrary objective function related to the con- 
trolled input (e.g., Schwanenberg et al., 2011). Computa- 
tional costs, independent of the dimensions of the gradient 
vector, are in the order of a model simulation itself and en- 
able the operational application of the MPC. Besides its ap- 
plication in MPC or MHE, adjoint modeling has been applied 
to more sophisticated hydrologic models, e.g., in Castaings 
et al. (2009) for a first order sensitivity analysis of an over- 
land flow model. We believe that this technique has much 
unexploited potential in operational flood forecasting, since 
it provides significantly more information to a user than just 
simulation results. 

There are only a few application of the MHE method 
as a DA technique for hydrologic models (Linke et al., 
2011), likely due to, as for variational methods, the technical 
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difficulty of setting up adjoint models. Therefore, suitable 
technical frameworks for facilitating the implementation of 
hydrologic models in simulation and adjoint mode, such as 
that found in Schwanenberg et al. (2011), should be fur- 
ther explored and developed. Another line of research should 
cover the application of MHE to hydrologic models and its 
performance comparison with the commonly used DA tech- 
niques such as EnKF. Additionally, the integration of un- 
certain information in the disturbance (inputs) and the pro- 
cess model should be further explored. A relatively simple 
technique is multiple model predictive control which op- 
timizes a unique control trajectory for a multiple number 
of disturbances (van Overloop et al., 2008). This however 
only works in case of relatively close disturbance trajecto- 
ries. A novel approach without this limitation is the tree- 
based MPC, where a scenario tree is constructed for both 
the disturbance and control trajectory to enable adaptive con- 
trols (Raso et al., 2010). Furthermore, new opportunities for 
overcoming the challenges encountered in RTC applications 
(and those in hydrologic forecasting) may start to emerge if 
the RTC and hydrologic communities work together to learn 
from each other’s experiences in incorporating DA in their 
relevant applications. 

6 Community-based efforts 
6.1 Motivation 

Automated DA techniques are widely used in research and 
operations in areas like meteorology and oceanography. De- 
spite the fact that the Monte Carlo type filters are model in- 
dependent, most implementations of these DA methods in 
hydrologic research are custom implementations specially 
designed for (and integrated with the code of) a particular 
model. The use of custom implementations has a number of 
disadvantages. For example, it is very time consuming and 
expensive to develop and implement customized DA meth- 
ods and tools for every specific model and application. It is 
also difficult to reuse these customized DA methods or tools 
for other models and applications than those they are origi- 
nally developed or implemented for (i.e., there is an incom- 
patibility or transferability issue). This, to some extent, has 
hindered the advance of automated DA tools in operational 
hydrologic forecasting. 

As mentioned earlier, the improvement in numerical 
weather prediction over the last couple of decades has been 
enabled to a large degree by the development of community- 
based, generic modeling and DA frameworks and tools, 
which can effectively facilitate the transition from research 
to operations as well as from operations to research. It is 
expected that operational hydrologic forecasting can bene- 
fit from similar community-based research and development 
efforts. In the hydrologic community, such efforts are starting 
to emerge, albeit not as mature or established as those in the 
weather forecasting world. For example, the open service- 
oriented infrastructure of the Flood Early Warning System 


(Delft-FEWS, Werner et al., 2012) developed at Deltares 
(http://www.deltares.com) has been adopted by various op- 
erational river forecast centers around the world to develop 
their next generation of operational forecast systems, includ- 
ing the Community Hydrologic Prediction System of the 
US NWS (McEnery et al., 2005). The need for community- 
based efforts had also led to the recent initiative in develop- 
ing a Community Hydrologic Modeling Platform (CHyMP, 
Famiglietti et al., 2008) within the context of the Consortium 
of Universities for the Advancement of Hydrologic Science, 
Inc (CUAHSI). 

One important feature of community modeling is that it 
supports, through community contributions and feedback, 
the generic implementation of various models, forcing data 
sources, parameter data sets, performance evaluation and 
other tools within a single framework. DA can benefit from 
similar community efforts by developing generic tools for 
implementing various DA algorithms and different types of 
observational data sets, to serve the diverse needs of different 
DA problems encountered in hydrologic research and opera- 
tional applications. It is expected that such community-based 
generic DA tools, when built upon a community modeling 
framework, can provide an efficient vehicle for advancing 
operational hydrologic forecasting and DA. 

6.2 Existing or emerging community DA efforts 

One example of integrated modeling and DA systems in land 
surface hydrology is the Land Information System (LIS) de- 
veloped at NASA (Kumar et al., 2006, 2008a, b). The LIS is 
a flexible land surface modeling and DA framework designed 
to integrate satellite- and ground-based observational data 
products, various land surface and hydrologic models, and 
advanced DA techniques to produce optimal fields of land 
surface states and fluxes. It features a high performance com- 
puting infrastructure that provides adequate support for per- 
forming computationally intensive data integration and DA 
applications over user-specified regional or global domains. 
In atmospheric sciences, a well-known package is the Data 
Assimilation Research Testbed (DART, http://www.image. 
ucar.edu/DAReS/DART/). DART is an open-source commu- 
nity framework for DA developed at the National Center for 
Atmospheric Research (NCAR). It contains advanced EnKF 
implementations with features like inflation and smoothing. 
DART has been successfully linked to some large opera- 
tional models including the Community Atmospheric Model 
(CAM) and the Weather Research and Forecasting (WRF) 
regional prediction model (Anderson et al., 2009). 

A few other generic libraries exist or have been proposed. 
Nerger et al. (2005) introduced the Parallel DA Frame- 
work (PDAF, http://pdaf.awi.de/trac/wiki) to facilitate the 
implementation of ensemble DA systems in large-scale geo- 
physical models. It offers a number of efficient parallel 
implementation of ensemble DA algorithms including the 
LETKF. In addition, a MATLAB based DA package for 
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hydrology was proposed by Drecourt et al. (2006). Recently, 
van Velzen and co-workers (van Velzen and Verlaan, 2007; 
van Velzen and Segers, 2010) proposed COSTA, a generic 
programming environment for DA and model calibration, 
while El Serafy et al. (2007) and Weerts et al. (2010) de- 
veloped a user-oriented generic toolbox for DA. Building on 
these two previous developments, an open source initiative 
for DA (OpenDA, www.openda.org/joomla/index.php) was 
launched in 2010 to facilitate the generic implementation of 
various DA algorithms as well as calibration algorithms. 

6.3 Challenges and opportunities 

For community-based DA efforts, the main challenges that 
remain are on custom implementations of DA algorithm 
and the noise and error models as well as computational 
performance. 

Unlike DA, many optimization packages or toolboxes have 
been developed for generic model calibration and several 
specifically for hydrological models; a prime example is the 
Parameter Estimation Toolbox (PEST, Gallagher and Do- 
herty, 2007). The reason for this unbalance between generic 
tools for calibration and state updating is probably that state 
updating requires a high level of interaction with the nu- 
merical model. Model calibration can often be performed 
via relatively simple alterations of a model parameter file. 
One potential means of dealing with the high level interac- 
tion between model and DA algorithms is via the use of the 
open Model Interface (openMI, Moore and Tindall, 2005; 
Gregersen et al., 2007). The openMI allows models to ex- 
change data with each other on a time step by time step ba- 
sis as they run, facilitating the modeling of process interac- 
tions. This bears similarity with the exchange of model states 
between model and DA algorithms, although the focus of 
openMI is normally not to exchange the complete state vec- 
tors but geared towards exchange of model states or fluxes at 
specific locations. 

Another challenge in developing generic DA tools lies in 
the definition of model noise, which often depends on the 
model and requires a great deal of model specific knowledge 
and interactions that are difficult to generalize. However, one 
could argue that this generalization is not necessary, since the 
model noise description is part of the model itself so that the 
responsibility of providing the tools and methods for error 
estimation lies with the model and not with the interfaces 
that deal with connecting model-algorithm-observations. 

Finally, although parallelism can be easily implemented to 
deal with the heavy model computation needed by the DA 
algorithm, the optimal parallelization of the DA update al- 
gorithm requires a different distribution of the data over the 
processors (distributing rows of a matrix) than the optimal 
parallelization of model computations (distributing columns 
of a matrix). Optimal results are therefore to be expected with 
a hybrid form of parallelization of the data (Roest and Volle- 
bregt, 2002). Parallel computing is used in DA systems like 


DART, PDAF, and OpenDA. DART allows the paralleliza- 
tion of the update algorithm which is useful when a large 
amount of observations needs to be assimilated. In DART 
the model steps can be performed in parallel as well, but the 
data interaction between the model and filter is still sequen- 
tial by files. PDAF is implemented as an extension to the 
model code; the update algorithm is therefore parallelized 
using the same approach as the model without any exchange 
by files. OpenDA automatically parallelizes the model com- 
putations and allows parallel models to be used without file 
interaction; implementation of the parallelization of the up- 
date algorithms is planned for the near future. 


7 Summary and discussions 

The need for transitioning of hydrologic DA research into 
effective operations has become increasingly recognized in 
the wake of frequent occurrences of extreme events in re- 
cent years and increasing availability of new observations. 
This paper reviews the current status of DA applications 
in hydrologic research, and discusses the existing or po- 
tential challenges and emerging opportunities in transition- 
ing hydrologic DA research into effective and efficient op- 
erational forecasting tools. The discussion focuses on sev- 
eral critical aspects related to hydrologic DA and is briefly 
summarized below. 

Several theoretic or mathematical challenges need to be 
addressed before hydrologic DA can fully benefit operational 
forecasting (Sect. 2). Issues include the high nonlinearity in 
hydrologic processes, the high dimensionality of the state or 
parameter vectors of hydrologic models, the skewness and 
heteroscedasticity in the probabilistic distribution of hydro- 
logic variables, the need for impractically large samples in 
ensemble approaches, and the limited observations of ex- 
treme events. Emerging opportunities include localized and 
transformation-based ensemble approaches and decompos- 
ing of the hydrologic forecast system into smaller subcom- 
ponents for separate DA solutions. It is recommended that 
bias correction precede or accompany DA applications. 

The success of a DA scheme depends critically on the 
characterization of uncertainties (Sect. 3). Uncertainty in pre- 
cipitation can be quantified by stochastically perturbing pre- 
cipitation inputs or through conditional simulation methods. 
Determining model uncertainty can be complicated by inter- 
actions among uncertainty sources, poorly constrained infer- 
ence problems, and difficulty in constructing a reliable multi- 
model ensemble. These issues can be addressed by disentan- 
gling uncertainty sources, intelligent use of available data for 
inverse modeling, using integrated multi-model and multi- 
parameterization frameworks, and combining the strength of 
DA and multi-model ensembles. 

Hydrologic forecasting can potentially benefit from inte- 
grating, via DA, newly emerging observations such as remote 
sensing data (Sect. 4). Some of the difficulties (or emerging 
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opportunities) in effectively assimilating these data include, 
for example, developing proper modifications of an oper- 
ational hydrologic model to assimilate “raw” satellite data 
(i.e., radiance observations), constructing proper “mapping” 
relationships between remotely observed and modeled vari- 
ables, providing appropriate specification of uncertainty in 
remote sensing data, and building an efficient computing 
infrastructure for retrieving remote sensing data to support 
satellite DA in operational forecasting. 

Although less well-known in the hydrologic community, 
DA has played an important role in real-time control of water 
resources systems and hydraulic structures (Sect. 5). DA for 
real-time control is often conducted within a parameter opti- 
mization framework and uses optimization-based techniques 
(rather than sophisticated DA approaches such as EnKF) for 
state updating. It is recommended that the hydrologic and 
control communities work together to learn from each other’s 
experiences to more efficiently address issues encountered in 
DA applications. 

Besides the computational and technical aspects already 
discussed, other operational aspects like transparency and 
clearness of the outcomes of DA applications are at least as 
important for the uptake of automated DA methods in oper- 
ational practice. For operational forecasters who are used to 
manual interactions with the forecast system, automated DA 
of the black-box type may be too adventurous a step to take 
in the short run. Hence, efforts are needed to speed up the 
operational implementation of automated DA, by enabling 
the operational control (with manual interaction) over noise 
models and observations to be used in DA, and developing 
DA-guided systems, where the DA results will be presented 
next to the current forecast to guide operational forecasters. 

It is important to note that comprehensive and robust ver- 
ification of DA results is necessary to demonstrate the value 
of DA for operational forecasting and to build trust in DA 
among operational forecasters. One important goal of DA in 
operational forecasting is to provide an improved analysis of 
the model initial conditions to produce improved hydrologic 
forecasts. However, the link between the accurate character- 
ization of the initial conditions and the sensitivity of fore- 
cast skill at different lead times to this characterization is 
still uncertain, largely due to lack of proper verification of 
the potential gain from DA in a forecast context. Some have 
also argued that statistically based post-processing of hydro- 
logic forecasts may outperform DA, since the latter aims at 
improving initial conditions that may not have a sufficiently 
long memory to improve forecast skill at longer lead times. 
All of these point to the need for robust forecast verification 
(e.g., Demargne et ah, 2010) that will identify and quantify 
the sensitivity of forecast skill to accuracy of initial condi- 
tions and hence help quantify the value of DA for operational 
hydrologic forecasting. 

The various issues described above call for the need of a 
community-based approach to hydrologic DA, which aims 
at providing a set of generic modeling, DA, and verifi- 


cation tools to serve the diverse needs of the community 
and to facilitate effective and efficient advances through 
community contribution and feedback (Sect. 6). This also 
opens a promising pathway for the cost-effective transi- 
tion of hydrologic DA research into operational forecasting 
while at the same time facilitating the communication of 
new hurdles encountered in operational DA back to the re- 
search community. In summary, it is recommended that cost- 
effective transition of hydrologic DA from research to op- 
erations should be helped by developing community-based, 
generic modeling and DA tools, and through fostering col- 
laborative efforts among modellers, DA researchers, and 
operational forecasters. 
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